97 research outputs found

    Mean-Field Theory of Meta-Learning

    Full text link
    We discuss here the mean-field theory for a cellular automata model of meta-learning. The meta-learning is the process of combining outcomes of individual learning procedures in order to determine the final decision with higher accuracy than any single learning method. Our method is constructed from an ensemble of interacting, learning agents, that acquire and process incoming information using various types, or different versions of machine learning algorithms. The abstract learning space, where all agents are located, is constructed here using a fully connected model that couples all agents with random strength values. The cellular automata network simulates the higher level integration of information acquired from the independent learning trials. The final classification of incoming input data is therefore defined as the stationary state of the meta-learning system using simple majority rule, yet the minority clusters that share opposite classification outcome can be observed in the system. Therefore, the probability of selecting proper class for a given input data, can be estimated even without the prior knowledge of its affiliation. The fuzzy logic can be easily introduced into the system, even if learning agents are build from simple binary classification machine learning algorithms by calculating the percentage of agreeing agents.Comment: 23 page

    Support vector machine model for diagnosis of lymph node metastasis in gastric cancer with multidetector computed tomography: a preliminary study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Lymph node metastasis (LNM) of gastric cancer is an important prognostic factor regarding long-term survival. But several imaging techniques which are commonly used in stomach cannot satisfactorily assess the gastric cancer lymph node status. They can not achieve both high sensitivity and specificity. As a kind of machine-learning methods, Support Vector Machine has the potential to solve this complex issue.</p> <p>Methods</p> <p>The institutional review board approved this retrospective study. 175 consecutive patients with gastric cancer who underwent MDCT before surgery were included. We evaluated the tumor and lymph node indicators on CT images including serosal invasion, tumor classification, tumor maximum diameter, number of lymph nodes, maximum lymph node size and lymph nodes station, which reflected the biological behavior of gastric cancer. Univariate analysis was used to analyze the relationship between the six image indicators with LNM. A SVM model was built with these indicators above as input index. The output index was that lymph node metastasis of the patient was positive or negative. It was confirmed by the surgery and histopathology. A standard machine-learning technique called k-fold cross-validation (5-fold in our study) was used to train and test SVM models. We evaluated the diagnostic capability of the SVM models in lymph node metastasis with the receiver operating characteristic (ROC) curves. And the radiologist classified the lymph node metastasis of patients by using maximum lymph node size on CT images as criterion. We compared the areas under ROC curves (AUC) of the radiologist and SVM models.</p> <p>Results</p> <p>In 175 cases, the cases of lymph node metastasis were 134 and 41 cases were not. The six image indicators all had statistically significant differences between the LNM negative and positive groups. The means of the sensitivity, specificity and AUC of SVM models with 5-fold cross-validation were 88.5%, 78.5% and 0.876, respectively. While the diagnostic power of the radiologist classifying lymph node metastasis by maximum lymph node size were only 63.4%, 75.6% and 0.757. Each SVM model of the 5-fold cross-validation performed significantly better than the radiologist.</p> <p>Conclusions</p> <p>Based on biological behavior information of gastric cancer on MDCT images, SVM model can help diagnose the lymph node metastasis preoperatively.</p

    Support vector machine versus logistic regression modeling for prediction of hospital mortality in critically ill patients with haematological malignancies

    Get PDF
    Background: Several models for mortality prediction have been constructed for critically ill patients with haematological malignancies in recent years. These models have proven to be equally or more accurate in predicting hospital mortality in patients with haematological malignancies than ICU severity of illness scores such as the APACHE II or SAPS II [1]. The objective of this study is to compare the accuracy of predicting hospital mortality in patients with haematological malignancies admitted to the ICU between models based on multiple logistic regression (MLR) and support vector machine (SVM) based models. Methods: 352 patients with haematological malignancies admitted to the ICU between 1997 and 2006 for a life-threatening complication were included. 252 patient records were used for training of the models and 100 were used for validation. In a first model 12 input variables were included for comparison between MLR and SVM. In a second more complex model 17 input variables were used. MLR and SVM analysis were performed independently from each other. Discrimination was evaluated using the area under the receiver operating characteristic (ROC) curves (+/- SE). Results: The area under ROC curve for the MLR and SVM in the validation data set were 0.768 (+/- 0.04) vs. 0.802 (+/- 0.04) in the first model (p = 0.19) and 0.781 (+/- 0.05) vs. 0.808 (+/- 0.04) in the second more complex model (p = 0.44). SVM needed only 4 variables to make its prediction in both models, whereas MLR needed 7 and 8 variables in the first and second model respectively. Conclusion: The discriminative power of both the MLR and SVM models was good. No statistically significant differences were found in discriminative power between MLR and SVM for prediction of hospital mortality in critically ill patients with haematological malignancies

    Prediction of potential drug targets based on simple sequence properties

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>During the past decades, research and development in drug discovery have attracted much attention and efforts. However, only 324 drug targets are known for clinical drugs up to now. Identifying potential drug targets is the first step in the process of modern drug discovery for developing novel therapeutic agents. Therefore, the identification and validation of new and effective drug targets are of great value for drug discovery in both academia and pharmaceutical industry. If a protein can be predicted in advance for its potential application as a drug target, the drug discovery process targeting this protein will be greatly speeded up. In the current study, based on the properties of known drug targets, we have developed a sequence-based drug target prediction method for fast identification of novel drug targets.</p> <p>Results</p> <p>Based on simple physicochemical properties extracted from protein sequences of known drug targets, several support vector machine models have been constructed in this study. The best model can distinguish currently known drug targets from non drug targets at an accuracy of 84%. Using this model, potential protein drug targets of human origin from Swiss-Prot were predicted, some of which have already attracted much attention as potential drug targets in pharmaceutical research.</p> <p>Conclusion</p> <p>We have developed a drug target prediction method based solely on protein sequence information without the knowledge of family/domain annotation, or the protein 3D structure. This method can be applied in novel drug target identification and validation, as well as genome scale drug target predictions.</p

    In silico approach to screen compounds active against parasitic nematodes of major socio-economic importance

    Get PDF
    Infections due to parasitic nematodes are common causes of morbidity and fatality around the world especially in developing nations. At present however, there are only three major classes of drugs for treating human nematode infections. Additionally the scientific knowledge on the mechanism of action and the reason for the resistance to these drugs is poorly understood. Commercial incentives to design drugs that are endemic to developing countries are limited therefore, virtual screening in academic settings can play a vital role is discovering novel drugs useful against neglected diseases. In this study we propose to build robust machine learning model to classify and screen compounds active against parasitic nematodes.A set of compounds active against parasitic nematodes were collated from various literature sources including PubChem while the inactive set was derived from DrugBank database. The support vector machine (SVM) algorithm was used for model development, and stratified ten-fold cross validation was used to evaluate the performance of each classifier. The best results were obtained using the radial basis function kernel. The SVM method achieved an accuracy of 81.79% on an independent test set. Using the model developed above, we were able to indentify novel compounds with potential anthelmintic activity.In this study, we successfully present the SVM approach for predicting compounds active against parasitic nematodes which suggests the effectiveness of computational approaches for antiparasitic drug discovery. Although, the accuracy obtained is lower than the previously reported in a similar study but we believe that our model is more robust because we intentionally employed stringent criteria to select inactive dataset thus making it difficult for the model to classify compounds. The method presents an alternative approach to the existing traditional methods and may be useful for predicting hitherto novel anthelmintic compounds.12 page(s

    Korarchaeota Diversity, Biogeography, and Abundance in Yellowstone and Great Basin Hot Springs and Ecological Niche Modeling Based on Machine Learning

    Get PDF
    Over 100 hot spring sediment samples were collected from 28 sites in 12 areas/regions, while recording as many coincident geochemical properties as feasible (>60 analytes). PCR was used to screen samples for Korarchaeota 16S rRNA genes. Over 500 Korarchaeota 16S rRNA genes were screened by RFLP analysis and 90 were sequenced, resulting in identification of novel Korarchaeota phylotypes and exclusive geographical variants. Korarchaeota diversity was low, as in other terrestrial geothermal systems, suggesting a marine origin for Korarchaeota with subsequent niche-invasion into terrestrial systems. Korarchaeota endemism is consistent with endemism of other terrestrial thermophiles and supports the existence of dispersal barriers. Korarchaeota were found predominantly in >55°C springs at pH 4.7–8.5 at concentrations up to 6.6×106 16S rRNA gene copies g−1 wet sediment. In Yellowstone National Park (YNP), Korarchaeota were most abundant in springs with a pH range of 5.7 to 7.0. High sulfate concentrations suggest these fluids are influenced by contributions from hydrothermal vapors that may be neutralized to some extent by mixing with water from deep geothermal sources or meteoric water. In the Great Basin (GB), Korarchaeota were most abundant at spring sources of pH<7.2 with high particulate C content and high alkalinity, which are likely to be buffered by the carbonic acid system. It is therefore likely that at least two different geological mechanisms in YNP and GB springs create the neutral to mildly acidic pH that is optimal for Korarchaeota. A classification support vector machine (C-SVM) trained on single analytes, two analyte combinations, or vectors from non-metric multidimensional scaling models was able to predict springs as Korarchaeota-optimal or sub-optimal habitats with accuracies up to 95%. To our knowledge, this is the most extensive analysis of the geochemical habitat of any high-level microbial taxon and the first application of a C-SVM to microbial ecology
    corecore